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ABSTRACT 

This resource unit gives administrators at the 
elementary and secondary levels practical help in the area of 
assessment, evaluation and accountability. The first section deals 
with basic sources of information on models and conceptualizations of 
full program evaluations. The second section cites magazine articles 
cind special monographs, which are shorter and more specific 
treatments than the basic sources. Brief descriptions of products 
available from agencies concerned with the evaluation of materials • 
and instruction constitute the third section- The fourth section 
contains two selective bibliographies. Section five includes a 
variety of instruments and samples of working guidelines useful in 
collecting, analyzing, and interpreting evaluative data. The final 
section is a glossary of terms. (HP) 
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FOREWORD 



The ever-increasing demand at all levels of education for accountability to clients, taxpayers, teachers, administration, boards 
of education and the legislature has resulted in an emphasis on research and evaluation activities. This new perspective has ^ 
caused members of the educational establishment to seek ways of answering questions about accountability based on rational 
and, hopefully, empirically based data. 

The following unit has been developed to provide information to the educational community so that questions can be answer- 
ed in a more professional and knowledgeable manner. Several source documents are provided for reference, as well as models 
which can be used, an J a. glossary of the most common terms used In connection with research, evaluation and accountabil- 
ity. 

It is hoped that this unit will assist the members of the school community as they seek to develop a better program of instruc- 
tion for the young people of Illinois. This unit Is a!so Important as a representative document developed by cooperative effort 
between the Office of the Superintendent of Public Instruction and another educational institution In the State. 



Michael J. Bakalis 
Superintendent of Public Instruction 
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INTRODUCTION 



tiuiwnui lin.niri.il rinishaints iuitl a chniigiiig sonitil context havo placed education in tho public spotlight more than 
l»' l(»u'. Ev.iln.i! H»n nl indivulual stiicionl Jearning, of prograins. of curi ictila, of school [)rocr!Sses and transactions, of goals, ^nd 
(»1 h'.H^hing In^hiivior, is therefore uigeatly demanded. Students, parents, and taxpayers have alwtiys "evaluated" the schools, 
lurl seldom with riiuch uigency and almost never systematically. In a pluralistic society, rational men can disagree, particu- 
liH ly al)out the important purposes and goals of education. This leads to a requirement of multiplicity in evaluative 

oacfies. However, educators {teachers, supervisors, and administrators) simply must see that a large part of the acts of 
evaluation is kept in professional hands, and that means in their hands. 

Tfiis booklet is intended to give practicing educators at the elementary and secondary levels some practical help in thu 
.ishcssmenl ovalutition-accountability job. One qf the questions we hear most is: "Where can I find a source that will tell me 

irhout a method of evaluating ?" The blank may be filled with "our curriculum" or with "my teaching" or 

with "student learning in my course." The pages which follow put most emphasis on answering this question, although some 
of the space has been used to give illustrations of useful instruments. 

The fit si section deals with basic sources which can lead the reader to models and conceptuafizations of the full evaluation 
>c(!ne. The second section cites magazine articles and special monographs which should be available to the school person 
UMiiting to do a systematic job. Many of these are shorter and more specific treatments than are the basic sources. Brief 
iiescriptfons of products available from agencies concerned. with the evaluation of materials and instruction constitute the 
third section. The fourth section contains two selective bibliographies, both of which are designed to complement other sec- 
tions. Section five ipcludes a variety of instruments and samples of working guidelines useful in collecting, analyzing, an^d 
iiitcn preting evaluative data. The f inqf section is a glossary of terms which readers may wish to glance through while studying 
the material in other sections. Each part of the Resource Unit Is capable of standing alone and also of serving as an integral 
part of the total work. 

Between the divisions are short "linljcfng statements" which will help you see the connections among the differenUriatT- 
ric|ls. You may wish to look through these linking statements before going into any one section. 

Gordpn A. Hoke did most of the annotations and th^ connecting passages with review help from others. Ludwig W. 
Nemeth contributed in locating sources and reviewing the work. The booklet has been tried put with some practicing school 
people and they have found it helpful. We hope that you do also. 



PART I - BASIC SOURCES 



Source: AFT-QUEST Program. Department of Research, American Federation of Teachers, 1012 14th Street N.W., 
Washington, D.C. 20005 

This relatively new ser vice provides both reports and topical papers. The former are employed to publicize the results of 
QUEST conferences and to report on major topics of broad appeal. Papers deal with more specific issues and are prepared on 
a regular schedule. 

Reports and papers alike range across sever^al areas and cover issues other than those of evaluation and accountability. How- 
ever, each publication presents a type of evaluative comment on different aspects of the educational scene. For example, 
Paper No. 12 is entitled, "The Paradigm for Accountability" and delves into the realm of teacher education. "Quality 
Teaching," Paper No. 3, also contains a brief statement on the evaluation of inservice training. 

QUEST Reports are more comprehensive and include bibliographies, Viewpoints of teacher organizations seem certain to 
become more inf iuentiai in the determination of school policy and practice in the forthcoming decades, a probability which 
adds significance to the material found in this source. 



Source: "Assessment of Learning Outcomes" by J. Thomas Hastings; in The Supervisor: New Demands-New Dimensions 

(edited by William H. Lucio), Association for Supervision and Curricula Development, IMEA, 1201 Sixteenth Street 
N.W., Washington, D.C, 20036, 1969. ($2.50) 

Although the setting in which this presentation was made dictated that remarks be directed to school supervisors, the recent 
surge of interest in evaluation suggests that the article could readily serve as a general resource. The author's stress on the 
need to provide a broad framework for the examination of learning outcomes is the key element. 

Hastings briefly reviews and analyzes the work of leading authorities in the field of evaluation and measurement. His remarks 
rar^ge from appraisal of the broad descriptive goals of Robert Stake to an overview of the precise behavorial objectives of 
Mager. The constant emphasis on the merits of full context of evaluation concludes with suggestions for the training and skills 
essential to thorough assessment of learning outcomes. 

Readers will find the graphic illustrations in this source to be extremely valuable as they reflect the basic structure of four 
major approaches to evaluation. The brief bibliography is tied directly to comments in the paper and represents some of the 
most significant writings In evaluation of the past two decades. 

At a time when pressures are mounting for schools to hastily respond to demands for accountability, this paper represents a 
thoughtful rejoinder that learning is, indeed, a complex matter. 



Source: Educational Evaluation: Official Proceedings of a Conference, Ohio Department of Education, Columbus,Ohio 
43200, 1969. 

This booklet is the product of a conference funded through Title V of the Elementary and Secondary Act of 1965.- Resource 
people included public school educators, personnel from state departments of instruction, scholars from the university com- 
munity, and school board members. Consequently, readers will find that the material covers a wide range of issues and pro- 
vides excellent diagnoses of important topics. 

Unlike many other publications in the field, "Educational Evaluation clearly distinguishes between research and evaluation in' 
its opening section. Explanatory charts and diagrams accompany the more technical presentations. Only one of the nine sec-, 
tions has a bibliography. However, the latter appears as part of a closing presentation entitled "Current Problems in Educa- 
tional Evaluation and Accountability," and adds pertinent background information to an area of growing interest and 
concern for both educators and laymen. 

Educational Evaluation schould serve schoolmen as a valuable general reference. Each of its sections represents an outstanding 
benchmark to guide future explorations of topics related to the broad realm of evaluation and accountability. 
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Sfuircir: Handhook on Formative and Summative Evaluation of Student teaming, Benjamin S. Bloom, J. Thomas Hastincjs, 
George F. Madaus, New York: McGravk/-Hill Book Company, 1971. 

1 iUrthoi s indicnii! tli,jt tins hook is a comprehensive report on the "slate of the art" of f?valiiai ing student \v.ii\ uwv^, 1 1 is 
iniriuh'cl. Ill \\\cu wonls. f)iimtittly foj present and future classroom teachers. However, the work should serve? as an excollunt 
nsoiiia? loi nil individuals tind groups interested in the field of evaluation. 

Although this document represents a fairly massive compendium, (it contains over 900 pages) readers will find numerous 
guides to eiWiance their reading and interpretation of contents. Part one consists of 12 chapters and an appendix. This part 
was prepared by -the authors and provides a thorough treatment of the major substantive issues in evaluation of student learn- 
ing, although it only touches upon curriculum evaluatwn in a few chapters. The second half is composed of a series of chap^ 
terswritten by scholars representing 1 1 areas of specialization. The topics covered in Part 2 range from preschool develop- 
ment through all realms of subject-matter knowledge along with an evaluation of their relationships to the instructional and 
learning contexts of formal edu cation. 

Diagrams, charts, and a host of operational examples highlight both parts of the work. Name and subject indices expedite 
usage of the material. Educators will discover that this book provides a definitive picture of the problems and possibilities 
suggested by the linking of evaluation to program development in our schools. 



Source: The Specification and Measurement of Learning Outcomes, David A. Payne, Waltham, Massachusetts: Blaisdell Pub- 
lishing Company, 1968. 

The author cites his basic purpose as an attempt to provide classroom teachers "with a practical and efficient set of tech- 
niques to aid in evaluating student achievement." His opening chapter distinguishes between "measurement/' "test," and 
"evaluation" and furnishes guidelines for examining the wealth of remaining material. The major focus of the book is on 
measurement. 

Payne submits that experienced teachers should find the contents to be helpful in their daily work. His view may be some- 
what optimistic since, as he acknowledges, undergraduate teacher training is frequently deficient in developing measurement 
competencies. Current pressures on schools to become more accountable for student performance, though, are resulting in 
numerous inservice training programs designed to improve teacher and administrator understanding of issues encompassing 
educational measurement. 

The boojc should be a fine resource for inservice education, local study groups, etc. Its most effective use will require careful 
planning and skillful instructional leadership. The author provides suggested readings at the close-of each of the ten chapters, 
supplies a comprehensive bibliography, and includes explanatory tables and iiharts thr^oughout his work. Merits and demerits 
of various approaches and instruments are noted. 

Schools interested in launching inservice ed''jcation programs as a response to cries for increased accountability will find The 
Specification of Learning Outcomes a worthwhile tool. 



Linking Section 1 - Q^.u&ra\ to the Specific 



The five basic sources described in the preceding pages represent general references in the areas of evaluation and account- 
ability. Educators interested in pursuing issues raised by the authors identif i^ in Part I will wish to explore sources noted in 
the next few pages. 

Learning is a complex matter, and the educational process cannot be honestly assessed without reliance on a variety of 
measures. Good men will disagree strongly about the proper approaches to use, but there is consensus that evaluation is a 
demanded and demanding task. Indications abound in the contemporary scene to suggest that cries for more accountability 
on the part of educational institutions are closely tied to fiscal affairs. True, education in a complex society is linked to eco- 
nomic costs as well as benefits. Yet schools also are caught up in social - political controversies. Any attempt to fairly evaluate 
their efforts nnust account for such complexity. 

Materials cited in Part II provide opportunities for exploring the many facets of evaluation. Vivid illustrations of the inter- 
mingling of economic-sociological-political elements support the statements above, The works found in this section are 
deserving of scholarly attention, for they discuss some of the most profound problems and possibilities in American society. 
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PART II -.PAMPHLETS/MONOGRAPHS, 
AND SPECIAL MAGAZINE ISSUES 



Source: "Accountabilrty in Education, "Educational Technology, January, 1971. 

The entire issue of this relatively new journal is devoted to the theme of accountability. A comprehensive treatment is 
provided although the major emphasis rests on a systems approach encompassing a variety of works dealing with performance 
objectives and educational audits for both rural and urban districts. 

Authors represented in this source include university and college professors and administrators; representatives of itate 
departments of education; and public school administrators. Analysis of accountability demands in VoTech Education and a 
description of an attempt by a state department of education to create a statewide evaluation program underscore the range 
of topics covered in tpvis issue. 

Copies of Educational Technology are available for $3.00; discounts are given for buik orders. Reprints of individual articles 
can be obtained for 25 cents each. Inquiries should be directed to: 

Educational Technology 

140 Sylvan Avenue 

Englewood Cliffs, New Jersey 07632 



Source: AERA Monograph Series on Curriculum Evaluation. Rand McNally and Company, P.O. Box 7600, Chicago, Illinois 
60680. 

Prospective users of this series should not be misled by the title. The sIk issues currently available deal with instructional 
objectives and classroom observation, for example, in addition to an expected emphasis on curriculum. 

Public school personnel are likely to find Volumes 1 , 3, and 6 the most helpful. Perspectives of Curriculum Evaluation, the 
initial release, contains an article by Michael Scriven on "The Methodology of Evafuation" regarded as a landmark effort in 
the field. Instructional Objectives, the third publication, includes a presentation by Wr James Popham, a pioneer in the 
development of behavioral objectives. Each of the four papers presented in this issue is followed by a discussion of its major 
points. The latest item entitled Classroom Observation examines the state of the art in attempts to derive certain principles of 
learning from better understanding of the relationship between classroom transactions and student growth. 

Each of the six publications is marked by extensive bibliographies and comments by the series editpr enhance underrtsnding 
of the contents. 

! ■ • . 



Source: Behavioral Objectives: Science, Social Studies, Mathematics, Language Arts, A Cauide to Individualized Learning. 

John C. Flanagan, William M. Shanner, Robert F. Mager. Westinghouse Learning Corporation, 2680 Hanover Street, 
Palo AJtq, California 94304, 1 971 . - 

The point of view that no single source of information is adequate as a basis for wise diecision-making guio'ed the preparation 
of thif series, so say its authors. 

Objectives found in the books originated from teachers and have been tried out in schools participating in Project PLAN, 
itself a partial outgrowth of the Project Talent study begun in 1960. Grades 1-12 are encompassed in each unit with the con- 
tents divided by levels: Primary (1-3); Intermediate {4-8); Secondary (9-12). 

The books are cross-referenced and cross-indexed. There is considerable overlapping of items related to four basic subject 
matter areas and the vast majority of objectives are identified with the cognitive area of learning. 



Source: Educational Evaluation and Decision-Making. The Eleventh Phi Delta Kappa Symposium on Educational Research. 
, Phi Delta Kappa, Incorporated, Eighth and Union Street, Bloomington, Indiana 47401. 

Phi Delta Kappa used the services and facilities of the Ohio State University Evaluation Center to hold a conference which 
inv()lved a discussion of this work. R'epresenting the efforts of a talented and diverse Study Commission, the theme of the 
final report reflects an approach to evaluation identified with Da.iiei L. Stuff lebeam. Director of the Center at Ohio State and 
Chairman of the Phi Delta Kappa Commission on Evaluation. 

Evaluation is defined as "the process of delineating, obtaining, and providing useful information for judging decision alterna- 
tives." And its purpose is seen as an attempt to improve not to prove. The reader is informed that the book is intended for a 
varied audience, although comments made later sugg,est that its most appropriate use is for evaluation units functioning inside 
eduGStiona! institutions. 

• • \ 



The dilemma faced by conference participants, and a problem candidly admitted by writers of the contents, concerns the 
complexities of the decision-making process. If evaluation provides "useful information" tor those who must decide, what 
information is of most wofth? Answers to that perplexing question are not found in the book but it represents the culmina 
tion of a time-consuming and arduous assignment by members of the Commission. The results provide innumerable Ideas, 
suggestions, and valuable counsel. 



Source: Phi Delta Kappan, December, 1970 issue 

Eight articles, plus the editorial page, spotlight the theme of "accountability" in this issue. All of the authors place major 
emphasis on a systems approach to assessing educational outcomes. Readers will find approximately 40 pages dominated by 
terminology taken from the fields of economics and engineering. 

Although guest editor Myron Lieberman expresses concern that the articles do not present a case for alternatives to public 
schools, implementation of the ideas presented in them surely would result in drastic changes. In fact, Leon Lessinger's com- 
ments pay particular attention to performance contracting by private enterprise, a phenomenon which has stirred much con- 
troversy In recent months. Concern for fiscal aspects of accountability, perhaps a logical response to legislative and taxpayer 
attacks, permeates these pages. 

At first glance, one gains the impression that the magazine provides a comprehensive treatment of accountability, but small 
schools are likely to find the time and cost demands, as suggested here, overpowering. Also, little space is devoted to the 
problem of definition of goals. 

Reprints of articles are available in minimum lots of 100. Inquiries should be directed to: 

Business Off ice 
Phi Delta Kappan 

Eighth Street and Union Avenue, Box 789 
Bloomington, Indiana 47401 



Source: Review of Educational Research: Educational Evaluation. American Educational Research Association, 1126 16th 
Street, N.W., Washington, D.C. 20036, April, 1970. 

Critical reviews of some of the most important work in evaluation are found in thi^ issue. While the major focus predictably 
rests on a review of pertinent research in fields ranging from an analysis of mear.i r.'nent techniques to the assessment of 
social action programs, there is much valuable information here for public school officials charged with responsibility for 
conducting evaluation. 

The material also places education in the context of public pollcydebates and political exchanges concerning the school's role 
in a pluralistic society. Readers will discover that contributors to this issue deal with some emerging trends in evaluation and 
accountability, particularly the interrelationships between goals-values-national priorities and the competing claims on all 
socialinstitutions, including education. < 

Bibliographies are very comprehensive, but the greatest value of this publication may well be. Its attempt to use past develop- 
ments and current practices as a gaucjj for judging future needs in the broad panorama of educational evaluation. 
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Linking Section 2 - Reliability Checks 

f M'w irtsi ilutioiis iHp M)<idv to laufKh comf)inhonsive piocjiamsof evaluation. Dticistons about life in the schools cannot he 
i.ikiMi litihtlv. H caiitjon firnily underscored by the authorfties noted in Part II. Nevertheless, how nan educators obtain some 
cif iht' nu)".! l)asic loo Is needed to carry on evaluation? Where can they find credible and helpful information? 

Ansv\'ers to such questions are suggested by the sources found In th^ upcoming section. Developmental efforts, where 
iifjplicable, were handled by reputable groups; the products emanating from these activities have been field tested by practic- 
ing teeichers and aoministrators. Training and information sites are accountable to government aaencies, to private founda- 
tions, and to a broad spectrum of clients. f 

Not all the "answers" will be satisfactory, and some of the most perplexing questions have no ready solutions. But the 
aspiring evaluator should find the resources cited in Part III capable of serving as the crux of a "quality control" system for 
evaluating institutional policies and practices and his relationship to them. 



1 
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Part III - SPECIAL AIDS AND RESOURCES 



Sourcn: Accountability Notebook. 

Gifted Children Section 

Department of Exceptional Children 

Off ice of the Superintendent of Public Instruction 

Springfield, Illinois 62705 

Prepared in the form of a (ooseleaf notebook, this resource lends an added dimension of "responsibility" to the theme sug- j 
yested in the title. Accountability is regarded as Including elements other than the economic one. Descriptive statements 
covering the broad domain of educational accouhtabiHty are followed in each case by illustrations of monitoring procedures 
that might be employed by school systems. A third section leaves room for examples of local actions. 



Source: Classroom Report. 

Gifted Children Section - 
Department of Exceptional Children 
Office of the Superinten.'^nt of Public Instruction 
Springfield, Illinois 62706 

This item was developed as a reporting device for teachers. It is designed to describe the class as a whole rather than individual 
students. Teachers can use it as one way of responding to certain calls for "Accountability"; i.e., reporting on class activities, 
expectations, and barriers. A manual featuring a collection of examples serves as a guide. 



Source: EPIC Diversified Systems Corporation. 

P.O. Box 13052 
Tucson, Arizona 85711 

This organization is an outgrowth of a Title III, ESEA, Project entitled Project EPIC. Originally funded as a model prototype 
for assisting schools with the evaluation of curricula, the enterprise has spawned training programs and the publication of 
evaluative materials. Major efforts are focused on three areas: 1) Accountability services, including educational audits; 2) 
Comprehensive planning and evaluation with an emphasis on needs assessment; 3) Leadership training institutes. 



Source: Instructional Objectives Exchange. 

Center for the Study of Evaluation, 

University of California 

Los Angeles, California 90024 \ 



The Exchange is devoted to the collection and distribution of operationally stated instructional objectives and related evalua- 
tion measures. Sets of objectives and items in virtually all subject areas are available beginning at the kindergarten level. Edu- 
cators can select materials from the depository which appear most suitable to their specific requirements. 



Source: The Educational Product Report 

Educational Products information Exchange 
(EPIE) Institute, 386 Park Avenue South 
New York, New York 10016 

EPIE publishes this item as a forum for providing descriptive and evaluative information and commentary about all types of 
learning materials, equipment, and systems. The Report is supported entirely by subscriptions and does not accept advertis- 
ing. Interested persons will find the February, 1969, issue on "Educational Evaluation: Theory and Practice" particularly 
useful. 




Linking Section 3 - Other References 



A major barrier to increase^i use of evaluation is the narrow image many public school teachers and administrators have of 
the field. Rarely has their undergraduate or ^aduate training provided sustained contacts with the questions of assasiment 
and decision-making now confronting them. The preceding pages represented ir.^ effort to pinpoint specific sources of help. 
Part IV builds on that base, extends it, and attempts to highlight some of the latest ideas and approaches to evaluation. 

References listed in Item A are cited in "Assessment of Learning Outcomes/' one of the five Basic Sources found in Part I. 
Hastings r«fers to therri in the contsx;: of his paper, furnishing operational examples and providing linkage between various 
ideas contained in these important works. 

Item B is comprised of ten references designed to complement the areas and issues covered in other sections of this 
Resource Unit. In some cases, the listing refers to another bibliography. Readers will note the diverse range of topics covered 
here, for the list rsfiects th^^rich ferment now marking developments in evaluation and accountability. 

\ 
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Linking Section 4 - Working Guides 



The twelve items found in Part V are labeled 'The Daily Dozen" In reference to their emphasis on utility. Their selection 
and arrangement draw upon the results of field studies conducted in 1968-69 and also reflect judgments obtained from indi- 
viduals and groups concerning the contents of this Resource Unit. There is considerable variance among these resources for 
they are based on reported needs in the realm of operational evaluation, i.e., meeting the functional demands of school 



An arbitrary arrangement featuring four (4) clusterii of task -related items characterizes this. section. An introductory note 
precedes each division. The first division contains materials generally identified with goal selection and choice of objectives. 
Cluster 2 holds various types of checklists, reting forms, testing procedures, grading plans. It is followed by two outlines for 
evaluating total school programs. Attitudes, judgments, and opinions receive due attention in the final cluster. A sophisticated 
analysis of weaknesses in performance contracting is accompanied by an explanatory note written in "layman's language." 
Part V concludes with a scale for determining attitudes toward educational evaluation. It, too, features side comments. 



personnel. 




PART V THE DAILY DOZEN 



Introductory Note- Cluster 1: "Goals and Objectives" 

Schools currently pursue many more objectives than any educator can specify, or that any evaluation plan can accommo- 
date. As recent history shows, public discussion of educational goals frequently evokes strong political and social reaction. 
Cultural diversity has enriched our nation, and future planning based on attempts to provide means for making educated 
choices appears to be a necessity, both for maintaining strong school systems and for the welfare of our diverse peoples. 

The two items found in this cluster point to certain procedures for consideration in choosing goals and pursuing objectives. 
They also emphasize the vital importance of the process used in those activities. 




An important task in any evaluation is the examination of 



Educational Goals 



Who has goals? 

Many groups of people have ideas concerning the proper goals of public education. The educational goals which a particular 
school system pursues reflect the Ideas and influence of groups such as the administrators, ..teachers, boards of education, 
parents, religious leaders, businessmen, and professors of education. 

Goals compatible? 

We must often determine if goals advocated by different qroups are compatible. When differences in objectives exist (as they 
frequently do), the evaluatcr may be asked to help describe these differences; for instance, to state which groups are taking 
which positions or goals. One of legitimate evaluation's responsibilities is the collection of information about the goal prefer- 
ences of different groups. 

What goals to evaluate? 

Key questions are whose goals? or what goals? The question of goal priorities always emerges. The school coach, the art 
teacher, and a local businessman will likely differ on what educational goals they think are important. 



Establishing priorities. 

Evaluators should be interested in how priorities are established and may include a description of this process in their evalua- 
tion report. Questions of concern to the evaluator are: Who has the legal authority to establish priorities about educational 
goals? Who has the informal power to do so? Who actually makes what decisions about which goals? 



Identifying (measuring) goals. 

Once goals are known, the evaluator must identify or describe these goals accurately in greater detail. Controversy centers on 
the question of the importance of translating all educational goals into strictly behavioral terms to measure achievement or 
nonachievement. Evaluators are often better able to make this translation than are curriculum specialists and teachers. 



Goals achieved? 

How to determine whether the goals or "intended outcomes" of a program or course have been achieved? Recent writers on 
evaluation argue that evaluation should also provide information useful in learning "why" goals were or were not achieved. 
Techniques for measuring outcomes range from psychometric tests to observation schedules to anthropological studies. 



For further ideas on educational goals as they relate to evaluation see: 

1. Atkin, J. M., Behavioral Objectives in Curriciilum Design: A Cautionary Note. Science Teacher, 35, No. 5, May 1968, 
27-30. 

2. Popham, J.W., Probing the Validity of Arguments Against Behavioral Goals. Paper read at American Educational Research 
Association, Chicago, February 1968. 

3. Stake, R. E., The Countenance of Educational Evaluation. Teachers Cotlege Record, 1967, 68, 523-540. 
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EDUCATIONAL OBJECTIVES 



1. In any teaching a great number of objectives are simultaneously pursued. High-priority, immediate objectives should 
usually be apparent to teacher and learner alike. Occasionally, either will do better without being aware of them. High- 
quality education is often accomplished by educators having but a partial awareness of the objectives. Sometimes it will 
increase teaching-learning effectiveness to make participants more aware of objectives; sometimes it will not. 

2. With all who share the responsibility of educating lies the responsibility for stating objectives, arranging environments, 
providing stimulation, evoking responses, and evaluating those responses. But each author and teacher does not share 
equally in those responsibilities. Time and talent are not available in limitless abundance. Each educator's 
assignment should capitalize on what he can do best. Few classroom teachers are skilled in stating objectives. 

Most are more highly skilled in adapting teaching to immediate circumstances, motivating students, and 

appraising responses. In the interests of effectiveness, seldom should they be required to formulate behavioral specifications. 

3. There are more objectives to pursue than we can pursue. Time and resources restrict us. We assign priorities to our goals 
in a highly informal way. Even this informal priority list is not always the critical determinant of the daily lesson or the 
minute-by-minute dialogue. Some moments are ripe for teaching toward an unplanned objective. A sound educational 
system is one which provides for occasional reassignment of immediate objectives to take advantage of the special 
opportunities that occur. 

4. The development of a new curricular program or set of instructional materials often proceeds better by successive 
approximations than by linear programming. With successive approximations, major attention is given ta getting an 
enterprise in operation, even though the initial runs are crude and faulty, so that corrections can be based on experi- 
ence. With linear programming, major attention is given to planning, precis?* specification, and symbolic representation 
so that corrections can be based on logical analysis. Advice on curriculum pJ^nning should be oriented to the experien- 
tial and logical skills already developed in the developers or that can be readii'y obtained by them. 

5. For creating lists of objectives, the technology of education should have some methods that rely on behavioral specifi- 
cation and symbolic delimitation and other methods that r^^ly on illustrative examples and inferable definitions. We 
need methods by which educators and others can endorse, reject, or revise statements of objectives. Two ccljssal 
problems lie before us: how to translate global objectives into specific behavioral objectives and how to derive appropri- 
ate teaching tactics. 

6. Our curriculum'development projects and our evaluation studies seldom reach a satisfactory specification by asking 
educators to state their objectives. Educator's global objectives give little guidance to teaching and evaluation. Their 
specific objectives ignore vast concerns that they have. In our present state the derivation of the specific from the gen- 
eral is some form of intuitive magic. Luckily it often works pretty well. We need to understand it, to simulate it, not 
necessarily to replace it. 
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Introductory IMote- Cluster 2: "Checklists, Rating Forms, Testing Procedures, Grading Plans" 



Respondents in the field testing of an Evaluation Kit in 1968-69 stressed a need for "How to" materials;. The six items 
found in this section are a response to their comments.. Readers will discover that the materials encompass varying levels of 
utility depending on the local situation. 

Illustrations of curriculum evaluation, item sampling, and textbook analysis may be more appropriate for system-wide 
committee usage. Likewise, teachers will find the grading plan provocative, and administrators may wish to discuss implica- 
tions of the issues covered by evaluation reports. 
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FORMAT FOR AN EVALUATION REPORT FOR AN EDUCATIONAL PROGRAM 

SECTION 1 - OBJECTIVES OF THE EVALUATION 

A, Audiences to be Served by the Evaluation 

B. Decisions about the Program, Anticipated 
SECTION II " SPECIFICATIONS OF THE PROGRAM 

A. Educational Philosophy Behind the Program 

B. Subject Matter 

C. Learning Objectives, Staff Aims 

D. Instructional Procedures, Tactics, Media 

E. Students 

F. Instructional and Community Setting 

G. Standards, Bases for Judging Quality 
SECTION III - PROGRAM OUTCOMES 

A. Opportunities, Experiences Provided 

B. Student Gains and '.osses 

C. Side Effects and Bonuses 

D. Costs 

SECTION IV " RELATIONSHIPS AND INDICATORS 

A. Congruence 

B. Contingencies 

C. Trend Lines, Indicators, Ratios 
SECTION V - JUDGMENTS OF WORTH 

A. Value of Outcomes 

B. Relevance of Objectives to Needs 

C. Usefulness of Evaluation Information Gathered 



i 
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A CHECKLIST FOR RATING AN EVALUATION REPORT 



This chocklist can be used to examine the report of an evaluation of an educational program to see if the report provides 
complete and useful information. 



Area I - THE EVALUATION ITSELF 

A. Audiences to be served by the evaluation 

B. Decisions about the program, anticipated 

C. Rationale, constraints, bias of evaluators 

Area II - SPECI FiCAT I OIMS OF THE PROGRAM BEING 
EVALUATED 

A. Educational philosophy behind the program 

B. Subject maner to be taught 

C. Learning objectives, staff aims 

D. Instructional procedures, tactics, media 

E. Students: biography, readiness, goals, etc. 

F. Instructional and community setting 

G. Standards, bases for judging quality 

Area III - PROGRAM OUTCOMES 

A. Opportunities, experiences provided 

B. ' Student gains and losses 

C. Side effects and unexpected bonuses 
Costs: cash, resources, work, morale 

Area IV - RELATIONSHIPS AND INDICATORS 

A. Congruence between Intent and actuality 

B. Contingencies, causes and effects 

C. Trend lines, indicators, comparisons 

AreaV - JUDGMENTS OF WORTH OF THE PROGRAM 

A. Value of outcomes, different points of view 

B. Relevance of objectives to needs 

Readability of report 

Usefulness of evaluation information gathered 

Comments: 



Needs 

Well Better Not Not 
Stated Statement Stated Applicable 




18 













cu 




c 










•H 




•H 




CO 


<u 




J2 




CO 




(0 


(0 




in 




U 


c 


0) 


CO 






C 


u 


0) 




0) 




0) 


O AJ 


0) 




Of 






^3 


CL, iH 


o 




o 


O 


CO 


CO 


03 tH 


en 




u 


c 


u 


0) 


<D -H 


< 


CO 






(0 


iH 





Q. CO 

a 

0) O 

>-» O 

> o 

O 03 



0) 
03 

01 03 
M 0) 
O o 
C3 O 
00 U 



N 

•H •»« 

rH 03 03 

(0 0) JJ 

3 > 03 

AJ ^ <D 

JJ AJ 

0) U 

U 0) 0) 

C2 -in > 

O ^ •H 

O O GO 



03 
<U 
> 

U 0) 

in 3 

O 03 O 

c 



•H :3 

O 03 

Q> CO 

CO B 



Cd 
c 

O 03 

*J > 

03 T-j 

c ^ 

w o 



03 
M 

•H O 
iH CO TJ 





p 


1 




















M 




XJ 


I 






o 






) 








rH 


O 


CD 






















•H 


u 












o 






u 


O 


rH 


V 










03 










•H 


U-l 






rH 


> 




a) 


a) 








M 


C 








•H 


O 


rH 




s 


i 






o 


& 




O 


4J 


CO 




s 




CO 






o 


0) 


03 


•H 


•H 


jCi 




T3 


C3 


& 


CO 


O 






C 




03 


T3 










o 






DO 


o 




c 


0) 


JJ 


U 


(30 


o 


u 




c 


c 


•H 




0) 


0) 


0) 


o 


c 


M 


CQ 


o 


CO 




iJ 




03 


M-4 


o 


u-l 







3> 

CO ' 
X 

w 



u-l 03 

o u 

0) 

0) 03 -a 

»-( <D -H 

O 3 03 

C rH AJ 

(to CO ;3 

»H > O 



AJ 03 

AJ 03 

B 03 

O •H 



c 
o 

••^ •H 

03 C 
00 •H 
O pu 
rH O 

CU Qi 
O > 
(1) 'H 
CkO 















03 




















a 


1 


a) 














(1) 


•H 


•H 






03 






(1) 


o 




CO 


CO 


o 




4J 








c 


rH 




•H 


c 


rH 


O 






rH 


0) 


CO 


AJ 


O 


(3D 


CO 


•iH 


0) 


03 


CO 


•H 


> 


c 


»H 


•H 


C W 


rH 




0) 


> 


O 


U 


0) 


U-l 




O 0) 


U-l 


O 


03 




»H 


0) 




•H 




03 3 


c 






0) 


U-l 






AJ 


>^ 


M rH 


o 




CO 




U-l 




AJ 




AJ 


0) CO 


o 


•H 


o 




0) 


§ 


03 


< 


»H 


o. > 












03 




















0) 






rH 














•H 






CO 


03 








0) 




O 






AJ 


AJ 








AJ 


03 


c 






c 


C 








CO 


£2 


0) 




0) 




•H 








c\* 


O 


CkO 




AJ 


a 


CO 








•H 


•H 


c: 




CO 




V4 








O 


03 






1-1 




AJ 








•H 


•H 


AJ 




0) 


0) 


03 








AJ 


O 


c 




t-H 


P. 


c 










QJ 


o 




o 


X 


o 










T3 


o 




H 


0) 


o 

















03 








03 










03 






M 








AJ 






AJ 


03 


03 


1 




0) 




03 




03 


03 


03 


c 


AJ 


0) 


CO 




0) 




AJ 




•H 


0) 


M 




03 


rH 


o 








03 




00 


> 


0) 


e 


•H 




•H 








•H 




O 




•H 


0) 


rH 


§ 


AJ 






rH 




rH 


AJ 


U-l 


M 


(0 




c 




•a 


•a 


CO 


rH 


O 


O 


•H 




•H 




0) 




•H 


0) 


c 


CO 


x: 


0) 


O 


03 


O 


0) 


x: 


C 


03 


^3 




•H 


o 




0) 


CO 


0) 


c 


AJ 


o 


AJ 


Q) 




O 


>^ 




a 


0) 


a 


o 




•H 




0) 


O 


o 


03 


§ 


03 


a 


03 


2; 


CO 


AJ 


o 


C 




03 





03 

c 

O 03 

•H AJ 

jj 03 
CO 

M rH 

0) CO 

CI. d 

O (0 



00 03 
O 03 
M 0) 
CUMH 03 
O AJ 
03 M C 
03 0) 

U 0) §} 
03 ^ 
•H CO 3 



.« a) 

AJ x: 

3 AJ 



0) 

> 
o 
o 

03 -a 
Q CO 



03 



> (1) C 

O > O 

C M ^ 

id Qi C 

03 vH 

O ,o 

AJ O O 



•a 

bO 3 

C AJ 
•H 03 

a 

O 

O 03 
a (1) 

= 5 

>s AJ 

^H CO 

AJ M 
C 0) 

0) AJ 
•a rH 
M CO 



AJ O 
Qi U 
Mi AJ 

c 

.« o 

03 U 
C I 
O >^ 
•H AJ 
AJ ^ 
CO rH 
O (0 

rH Cr 
& 

a 



03 


AJ 


MH 






03 <U 


c 


O 






03 > 


Qi 










AJ 


03 






U AJ 


C 


0) 




a) c 


00 O 


o 






x: (u 


O 0) 


0 




C2 


AJ m 






•a 


O 


Qi 




> 


0) 


•H 


AJ 14-1 


O 


a) 


o 


4-1 


M 14H 


AJ 




o 


O 


0.«H 


c -a 




m 






Qi U 




ci. 


m 


Q) 


CO 


m 




AJ 


>-( 03 






•a 


03 




s g 


o 


c 


c 


O CO 


03 AJ 


H 


CO 







Q) S 

a) 3 



a 3 



03 

MH >^ 

U-l 

CO 3 



o 

•H 



rH T3 
O (I) rH 
QUO) 

X: O 73 

CO <: X 



AJ 

c a 

O 0) 



•H 3 

U CO 

03 -a AJ 

Q) q CO 



09 Q> 

- o 

CO C 

AJ (1) 



^ 3 T3 

O O 

« a ?C 



I 

c 
o 

•H 00 
03 C 

•H -H 
o ^ 



Q) 

•§ 

fH rH 

(4H <U 

u-< 

AJ X 
CO 

a pn 

CO H 

« o 



CO 

.« (J 

X: M -H 03 

U 0) AJ AJ 

)-i C 03 03 

CO 00 'H >\ 

(1) •H AJ rH 

03 03 CO CO 

0) (1) AJ d 

pd 03 CO 



















1 


AJ 








u 














CO 


O 




u 




o 


B 












u 


0) 




o 


rH 


03 


O 




1 




0) rH 




AJ 


)-l 


AJ 


jC 


3 


•H »-i 


O 


M 


03 




U CO 




03 




03 


o 


o 


> 0) 


M 


0) 




U 


C CS 


£J 


•H 


•a 


•H 




•H 


M x: 


03 


x: 




o 




l-i 


c 




1-1 


CO 


M 


0) o 


03 


o 




4J 


•H MH 


O 






O 


o 


M 


CU CO 


CO 


CO 


a 




•a 








Qi 


U3 


3 


3 Q) 


rH 


Qi 


•a 


u 


3 MH 


0) 


•a 


O 


x: 


(1) 


a 


03 AJ 


O 


AJ 


CO 


AJ 


< o 


M 


< 


AJ 


H 


1-1 



0) o 

o 

Qi U 
03 



CO 

a 

U CO d) 



(1) c 

X <D 



03 •H 

. . _ >^ AJ 
a 03 CO 









00 


















(1) 




u-l 




(1) 








rH 




O 




AJ 


•a 






00 






CO 


C 






g- 


13 


C 




AJ 


CO 


00 


1 




'H 


o 




•rl 








03 


U 


•H 




rH 


rH 


•H 






3 


AJ 




•H 


CO 


3 


o 




•a 


CO 


o 


0 


c 


C 


•H 




C 


C 




CO 


o 


•H 


03 


Q) 


0) 


CO 




U-l 


•H 


AJ 


«H 


03 




rH 


AJ 




AJ 


c 


0 




AJ 




CO 


O 


CO 


o 


0} 


O 


3 




x: 


H 




o 


T3 


H 




(1) 









0) 






cu 






•H 






x: 


•a 




03 


c 




c 


CO 




o 




AJ 


•H 


(1) 


O 


AJ 


03 


(1) 


CO 


3 


U-l 


rH 


CO 


U-l 


Q) 


O 


Qi 


U 



01 
0) 

0) •H 

CO 3 a 

43 AJ O 

CO CO •H 

H «^ 

rH nJ 

CO CO 3 rH 

13 •H rH <D 

O CO 13 



o 
in 
I 



M CO I 



00. 

c 



01 



c 

O 



CO 
3 
rH 

$ 



o 
o 
x: 
u 

CO 

M 

CO 
•a 

c 
o 
o 

0) 



•a 

3 



CO 
C 
o 

•H 
AJ 

CO 



o 
•pi 



•pi 



•pi 



o 



CO 



CO 



CO 
> 





C 




o 


o 


•H 


w 


AJ 




CO 


1 


O- 




3 




T3 




fa 








u-l 




o 








a 


o 


o 




s 


o 




CO 


c 


1 


(1) 

AJ 

CI 


Q 


Cou 


8 


a> 


U 


x: 


CO 


H 






O 






U 


i 








H 




CO 






o 










§ 
















^3 


CO 



o 

? 

o 

§ 

CO 

CO • 

5i) m 

CO 

CO I 

O 
•pi 

« • 

Cq rH 



•S w 

a 
o 

. rH 

•a 0) 

W > 
\^ 0) 

Q 

AJ 3 
CO rH 
0) 3 
PQ O 
tH 

> u 

« »-< 

3 

. a 
•a 

M CO 

• O 

,00 •H 
C 03 
•H •H 
^ > 

C 3 

O CO 
•H 

03 M 

•H O 

(.1 U-l 
(1) 

Q C 
O 

M •H 

O AJ 

MH CO 
•H 

AJ U 

c o 

(1) 03 
B 03 
C < 
<D 
AJ 

x: 

00 c 

•H O 

rH AJ 

C 00 

^ .s 

03 x: 

CO 03 
CO 

c s 
o 

•H 

AJ . 

CO 5h 

3 O 



,J 'pi 

« CO 

CO s: 



o 




o 




(0 




•H 




O 








CO 




u 








c 




CO 




CO 




* 

ST 




o 




<r 




rvJ 




o 




25 




AJ 




O 




d) 




•r-> 




O 




l-i 




















CO 




(U 




C/3 




Q) 




0!^ 




(1) 




> 








AJ 




CO 




U 




a 




a* 




o 




o 




a 
























Ch 








O 




O 












CO 








-p 




Q) 




E 




Ci) 




r«i 












•pi 








1 












o 




•pi 




















;i 




•pi 




+i 




•pi 




i Coc 




a 




vO 


CO 


vO 




as 


•pi 


rH 












O 




O 




03 




•H 


CO 


O 




d 




CO 




u 


•pi 










c 




CO 




CO 







x> 
•a 
1-1 

CO 
0) 



<u 

. CO 

<J (1) 

Q rH 
rH 

s a 

• 0) 
< AJ 

52 

H CO 



AROO GRADING PLAN 



A school-wjde plan for assignment of final course grades to students, 
Four grades are submitted for each student except where instructor 
feels does not have an adequate basis for grading. 



INFORMATION 



SOURCE OF OBJECTIVES 



STANDARD 



ABSOLUTE 
JUDGMENT 



RELATIVE 
STANDING 



OPPORTUNITY 
USED 



OPPORTUNITY 
PROVIDED 



The individual student's quality 
of work and readiness to perform 
with competence 

The individual student's standing 
among peers as to work quality 
and competence 

The individual student's effort 
and success in using this learning 
opportunity 



Quality of learning opportunity 
provided by school, instructor 
and other students 



The goals set by the 
acadismic department 



The goals set by the 
academic department 



The goals set by" the 
individual student 



The collective goals of 
the students (not specified) 



Quality and competence as 
valued by the instructor 



Empirically determined qual- 
ity and competence of the 
designated reference group 

Judgment by the instructor 
as to whether student success- 
fully pursued any educational 
objectives 

Judgment by the stijdents (one 
mark, the same for the whole 
class) 



Categories for reporting and interpreting grades: 



Absolute Judgment: 
Relative Standing: 
Opportunity Used: 
Opportunity Prov.: 



5 = Excellent 
5 = Superior 
5 = Excellent 
5 = Excellent 



4 = Good 

4 = Above average 

4 = Good 

4 = Good 



From: "Grading Students in the Real World, 
Robert E, Stake (Las Cruces, Jan. 11, 1971} 
Robert E. Stake, CIRCE, University of Illinois 



3 = Fair 2 = Poor 1 = Very Poor 

3 = Average 2 = Below average 1 = Inferior 

3 = Fair 2 = Poor 1 = Very Poor 

3 ~ Fair 2 = Poor 1 = Very Poor 

Background reference: 

Warren, Jonathan R., College grading practices: 
an overview. Berkeley: Educational Testing 
Service, 1970 
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Our WHV ol nieasLii ing the achievement of student groups is to use 



ITEM SAMPLING 



This sheet lells the story of an evaluation of Spanish instruction via television-using item-sampling techniques. To get that 
stoiy, the evaluator needed a good picture of learning. During the year he tested the students for 15 minutes every two weeks 
on their understanding of Spanish vocabulary. 

To start out, he made a pool of 360 test items, Roma es la capital de Sombrero 

then randomly sorted them into nine tests of (1) Francia (1) darkness 

40 items each. The test items looked like those (2) Brasil (2) sleep 

at the right although most were harder. ^ (3) Mexico (3) hat 

(4) el Canada (4) summer 

(5) Italia (5) cow 



Every second Thursday tests were randomly assigned to students. Therefore, all 360 were used twice a month even though 
each student was answering only 40 items twice a month. Note that the 360 items covered a lot of vocabulary each month 
even though student's testing tinrc was onJy 30 minutes. 

A student could draw the same test both times during the monthibut that did not happen often. It would have been 
better to have let the students take each test only once every 18 weeks but that would have required a little more work 
and class time. 

Since the same 360-item "test" was given to the TV-taught students each testing time, group means were used to show 
progress in learning vocabulary across the year. Here is the curve of progress for the TV students: 

^q"'. Number of students = 23 

20- 
10== 

0 

Week 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 



Mean 

Items Right 
Per Student 



This evaluation design is like a pretest-posttest design plus a lot of Intermediate testing. The curve of progress tells much more 
than that there is a mean gain of 4 items in 30 weeks. 

The results of the evaluation study are more meaningful when the evaluator shows the progress of students taught in other 
classes. 

40- Conventional 
Mean 30- N = 31 

Items Right 20- 

Per Student 10- ^ TV N = 23 

0 - 

Week 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 

Of course, none of this is meaningful without an understanding of what the TV and conventional instruction consisted of. 
Furthermore, there are other aspects of achievement beside vocabulary, e.g., pronunciation, grammar, idiom, word deriva- 
tion, literary usage, appreciation, etc. 

Item sampling is a valuable evaluation tactic when the interest is in group achievement rather than individual achievement, 
when the content to be covered is broad, and when only a little time for testing is available. 
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Di) you hdvv. the job of 

Selecting a textbook? 

A potential textbook selector who has answered the following questions (as well as others) should be able. to make realistic 
and knowledgeable judgments concerning the selection of texts and other educational materials, 

1 . Why is the present text inadequate? 

2. Will the adoption of this new text further the educational Interests of the community? 

3. What are the specific educational objectives in this content area, and how well wilf this text help in reaching such 
objectives? \ 

4. Will this text prove compatible with present Instructional methodology? 

5. Will the students find the text easy to follow and comprehend? attractive? 

\ 

6. Has the selection of this text been preceded by an objective consideration of other avail?^'le textbooks? 

7. Have appropriate book reviews, reports, or institutional comments on the text's usefulness been consulted?* 

8. What provisions have been made for an ongoing evaluation of the text if it is accepted? 

9. Does the publisher provide additional materials such as a teacher's handbook, workbooks, examinations, etc., to 
go along with the text? 

10. Is the purchase of a text the best use of limited financial resources? 

11. Have alternative instructional materials been investigated? 

* A further discussion of problems involved in the evaluation of instructional materials can be found in: 
. The EP!E Forum I, No. 4-5, December 1967 and January 1968 
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Introductory Note -Cluster 3: "Comprehensive Evaluation" 

While it is unlikely that any plan of evaluation could meet every single demand for accountability, the illustrations to fol- 
low are Indicative of the countless responsibilities confronting schools and their communities. 

Attempts to deal with accountability on a wide front cannot escape the implications of sharing power. When individuals 
and groups are included in the decision-making process, they are more willing to assume responsibility for consequent actions. 
Accountability carries with it definite overtones of responsibility. Unless the pivotal role of schools in modern society is 
understood by ah of us there is real danger that only individuals, most of them involved in the instructional process, will face 
the consequences of our present desire to evaluate the schools. 
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EVALUATION REPORTS 
DESCRIBING THE CONTEXT OF A PROGRAM 



City or Community Characteristics 

What is the population of the city or community? 

What adjective(s) would typically be used to describe the city or community? 
In what part of the country is it located? 

What is the percentage of deteriorating or dilapidated housing in the city or community? 

What is the city- or community-wide unemployment rate? 

What percent of families in the city or community are on welfare? 

What is the city- or community-wide literacy rate? 

What is the city- or community-wide school dropout rate? 

What is the city- or community-wide delinquency rate? 

Are there any special educational problems faced by the city or community? 

What attempts, if any, are being made to deal with these problems? 

fMeighborhood Characteristics 

What adjective(s) would typically be used to describe the neighborhood(s)? 
What is the average family income in the neighborhood(s}? 
What is the literacy rate in the neighborhood(s)? 

What kinds of occupations do most of the people in the neighborhood(s) have? 

What is the unemployment rate of the neighborhood(s)? 

What percent of the families in the neighborhood(s) are on welfare? 

What is the percent of nonintact families in the aeighborhood(s)? 

What ethnic groups, in what percent, are represented in the neighborhood(s)? 

What linguistic groups, in what percent, are represented in the neighborhood<s)? 

What is the population density (number of people per square mile) in the neighborhood(s) ? . 

What is the percent of multi-family dwellings in the neighborhood(s)? 

What percent of the dwellings were built pre-1940 in the neighborhood(s)? 

What percent of the dwellings are rental (rather than ov;/ner-occupied) in the neighborhood(s) ? 

What is the percent of deteriorating or dilapidated housing in the neighbornood(s)? 

What is the school dropout rate in the neighborhood(s)? 

What is the delinquency rate in the neighborhood(s)? 

Have these neighborhood characteristics remained constant in the last few years or is the neighborhood{s) in transition? 

School Characteristics - General 

What was the per capita expenditure, including both capital and operating expenses, prior to the program? 

What was the salary range for teachers in the school(s) for the year Immediately preceding the program? 

What is the age and condition of the main school building(s)? 

What grade levels were included in the school(s)? 

What was the average teacher-pupil ratio in the school(s)? 

How were the students routinely grouped in the school(s)? 

Were any pupils enrolled in the school(s) as a result of a bussing or open enrollment program? 
Was a conventional curriculum followed in the school(s)? 

What services, personnel, or special programs were available in the school(s) prior to the program? 

Were any other specially funded programs ongoing in the school(s) prior to the beginning of this program? 

At what intervals are achievement tests routinely given? 

What achievement tests are routinely given? To what grades? 

How are these achievement tests administered and by whom? 

How did the achievement level of the school(s) compare with city-wide and/or national norms prior to the program? 

School Characteristics - Teachers 

What were the paper qualifications of the teachers? 

What was the average number of years of teaching experience? 

What was the average age of the teachers? 

What was the maie-female ratio of teachers? 

What ethnic groups, in what percent, were represented by the teachers? 

What linguistic groups, in what percent, were represented by the teachers? 

What was the teacher turnover in the school(s) prior to the beginning of the program? ' 

Prepared by C. 0. Neidt, Colorado State University, 1969. 
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School Characteristics • Student Body 



What was the pupil enrollment in the school(s) at the beginning of the academic year? 
How many pupils withdrew or transferred from the school(s) after the school year began? 
How many pupils enrolled in the school(s) after the school year began? 
What was the average daily attendance in the school(s)? 

Has the total pupil enrollment in the schooKs) involved in the program changed in the last three years? 
What ethnic groups, in what percent, were represented by the students? 
What linguistic groups, in what percent, were represented by the students? 
What was the male-female ratio of the students? 

Historical Background 

Did the program exist prior to the time period covered in the present report? 
Is the program a modification of a previously existing program? 
How did the program originate? 

What special efforts were made to gain acceptance of the program by parents and the community before it began? 
If special problems were encountered in gaining acceptance of the program by parents and the community, how were 
these solved so that the program could be introduced? 
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DESCRIBING THE TREATMENT PROVIDED BY 
A PROGRAM 



Personnel: Instructional and Noninstructional 

What categories of personnel were added by program? 

What regular staff were assigned to program? 

What new staff were hired for program? 

What were paper qualifications for various personnel? 

What were average years of leievant experience of personnel? 

What were the most important duties of personnel? 

What was the time commitment of various personnel? 

What inservice training was provided? 

What was the male-female ratio of classroom personnel? 

What personnel characteristics enhanced or reduced program effectiveness? 

How did special needs of pupils affect staff development and utilization? 

Supporting Services 

What services were part of the program? 

What services were available to experimentals? To controls? To both? 
How did special needs of pupils affect provision of services? 

Organization: Schedules 

For how long did the program operate? 

How were experimental and control classes scheduled in the total school context? 

How many hours of instrijotion did experimentals receive? Controls? 

Were time intervals betVt/een learning and testing equivalent for these groups? 

Organization: Planning 

Were meetings held regularly for experimental and control teachers? 
What were the purposes of these meetings? 
Who was present {besides teachers) and why? 

Organization: Physical Arrangements 

Where were experimental classes located? 
Where were control classes located? 

What were the most noteworthy features of physical arrangements in each? . 
Organization: Grouping of Teachers 

How were experimental and control teachers grouped for instructional purposes? 

Organization: Group'mc^ of Pupils 

How were pupils grouped within the total school context? 
How were pupils grouped for instruction in experimental and control classes? 
• How many children were in each experimental class? In each control class? 

Major Program Segments 

What major segments comprised program? 

Which of these were available to experimentals? To controls? To both? 
Were segments equivalent for these groups in the following respects: 

Objectives? 

Emphasis? 

Provision for motivating pupils? 
How did special needs of pupils affect content of major program segments? 
What characteristics of these segments enhanced or reduced program effectiveness? 



Methodology: Pupil Activitins 



What worti main activities of oxpei imentais? Oi controls? 

How much timo w<js devoted to each main activity? 

How many pupils were involved in each? 

How were instructional materials used by pupils in each? 

Did pupils have freedom of choice in participating in each main activity? 

How much time did pupils spend in the program each day? Each week? 

Methodology: Teacher Activities 

What were main activities of teachers in experimental and control classes? 

How much time did the teacher spend with the pupils? 

What was the teacher-pupil ratio (or aide- or adult-pupil ratio)? 

What provision did the teacher make for pupil response? 

How. did the teachi^r use various instructional materials for the activity? 

What provision did the teacher make for pupil response? 

To what extent were teachers free to experiment with teaching methods? 

How did the teacher give feedback to pupils on individual progress? 

What provision did the teacher make for motivating pupils? 

Were amounts of practice, review, and quiz activities equivalent for both groups? 

Was content of these activities equivalent for both groups? 

How did special needs of pupils affect teaching methods? 

What characteristics of activities enhanced program success? 

Instructional Equipment and Materials 

What equipment and materials were used by experimentals? Controls? Both? 
In what amounts? 

What equipment and materials were used in each main activity In the two groups? 
What specific features suited a given device to a particular activity? 
Were materials equivalent for both groups in the following respects: 

Subject-matter content? 

Content of drill? 

Vocabulary level? 

What instructional materials were developed for program? How were they developed? 
What characteristics of materials enhanced or reduced program effectiveness? 
How did special needs of pupils affect selection or development of materials? 

Parent-Community Involvement 

. What provisions were made for parent and/or community involvement in the program? 
Were these provisions equivalent for parents of experimentals and controls? 

Were group meetings and/or parent conferences held for parents of experim'jntals and controls? Describe. 
Budget 

What was the total cost of program? (indicate length of time covered) 
From what sources were these funds obtained? 

What portion of total program cost was start-up expense? Continuation expense? 
Can you break down total program cost into broad categories of expenses? 
If the program were repeated, how would you modify the budget? 
What was per-pupil cost of program? 

How does it compare with normal per-pupil cost of schools in the program? 
Where can the reader get additional budget information? 



DESCRIBING, ANALYZING AND INTERPRETING EVIDENCE OF CHANGES 

INDUCED BY A PROGRAM 

Objectives: 

What was the program aiming to do for the children and adults in it? 

Were the children expected to improve their scores on achievement measures? If so, in what areas? 
Were the teachers or other adults expected to change their modes of instruction? 
Were the children expected to change their attitudes? If so, which ones? 
^ Were the teachers or other adults expected to change their attitudes? If so, which ones? 
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Sampling Procedures: 



How were the children and adults in the program chosen? 

Were the samples originally representative of the populations from which they were chosen? 

Were the controls selected before or after the program? 

Were steps taken to avoid the samples being affected by other programs? 

Were steps taken to avoid real differences in the quality of teachers selected for experimental and control groups? 
Was there attrition of the samples? 

Was there attrition of groups of children with the same characteristics? 
Were pupils added to the samples to replace dropouts? 

Were there many children who did not receive the treatment often because of poor attendance? 
Did the children participate voluntarily? 

Were the same children included in both pretest and posttest samples? 
Describing Samples: 

Which children received the treatment, from which adults? 

What is the srze of the experimental sample? 

What is the age or grade level of the experimental sample? 

How is the experimental sample divided into boys and girls? 

Are achievement scores available by which to describe the experimental sample? 

Which adults gave the treatment that constituted the program? 

Measuring Change: 

What measures were applied to find out whether the program's aims had been achieved? 

Were the measures matched to the objectives in content? 

Did the tests used have sufficient "floor" and "ceiling"? 

Were the same measures used for both experimental and control groups? 

Were the same measures (or parallel forms) used for both pre and posttesting? 

Were IQ tests used when achievement tests were more appropriate? 

Was the reliability of the tests quoted? 

Under what conditions were the measures applied? 

Were the same or different testers used for successive testings? 

Were oral, or written, instructions available for the tests? 

Were assessors or observers likely to bias the results for or against the program? 

How much time elapsed between testings? 

Were assessors or observers specially trained? 

Presenting Data: 

What data were obtained from the measures applied? 

What measures of central tendency should be used? 
What measures of dispersion were used? 

Were there graphical displays which could have been used to present data more clearly? 

Analyzing Data: 

. What analyses were undertaken of the data? 

Was there a proper basis against which to compare the progress of the experimental groupi' 
What was the correlation between pretest and posttest? 
What comparisons were drawn for subsamples? 

Is there any evidence that children who attended more gained more from the program? 
Was the formula or source given for the statistical test applied? 
Did the date meet the prerequisites for the statistical tests used? 
Were there real differences between the groups? 
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Drawing Conclusions: 



What conclusions were drawn from the analyses of the results? 

Were the conclusions based on statisticalprobabltity? 

Were the statistical conclusions translated into ordinary language? 

Were other conclusions stated io ordinary language? 

Can the conclusions be generalized, or are they applicable only to the sample or population served by the program? 

Were the conclusions of educational importance? 

What recommendations can be based upon the conclusions? 
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MULTIPLE CRITERION MEASURES FOR EVALUATION OF SCHOOL PROGRAMS* 



Newton S. Metfessel and William B. Michael 
University of Southern California 



I. Indicators of Status or Change in Cognitive and Affective Behaviors of Students in Terms of Standardized Measures and 
Scales. 

Standardized achievement and ability tests, the scores on which allow inferences to be made regarding the extent 
to which cognitive objectives concerned with knowledge, comprehension, understandings, skills, and applications 
have been attained. 

Standardized self inventories designed to yield measures of adjustment, appreciations, attitudes, interests, and 
temperament from which inferences can be formulated concerning the possession of psychological traits (such as 
defensiveness, rigidity, aggressiveness, cooperativeness, hostility, and anxiety). 

Standardized rating scales and check lists for judging the quality of products in visual arts, crafts, shop activities, 
penmanship, creative writing, exhibits for competitive events, cooking, typing, letter writing, fashion design, and 
other activities. 

Standardized tests of psychomotor skills and physical fitness. 

II. Indicators of Status or Change in Cognitive and Affective Behaviors of Students by Informal or Semiformal Teacher- 
made Instruments or Devices. 

Incomplete sentence technique: categorization of types of responses, enumeration of their frequencies, or ratings 
of their psychological appropriateness relative to specific criteria. 

Interviews: frequencies and measurable levels of responses to formal and informal questions raised in a face-to- 
face interrogation. 

Peer nominations: frequencies of selection or of assignment to leadership roles for which the sociogram technique 
may be particularly suitable. 

Questionnaires: frequencies of responses to items in an objective format and numbers of responses to categorized 
dimensions developed from the content analysis of responses to open-ended questions. 

Self-concept perceptions: measures of current status and indices of congruence between real self and ideal self - 
often determined from use of the semantic differential or Q-sort techniques. 

Self-evaluation measures: student's own reports on his perceived or desired level of achievement, on his percep- 
tions of his personal and social adjustment, and on his future academic and vocational plans. 

Teacher-devised projective devices such as casting characters in the class play, role playing, and picture interpreta- 
tion based on an informal scoring model that usually embodies the determination of frequencies of the occur- 
rence of specific behaviors, or ratings of their intensity or quality. 

Teacher-made achievement tests (objective and essay), the scores on which allow inferences regarding the extent 
to which specific instructional objectives have been attained. 

Teacher-made rating scales and check lists for observation of classroom behaviors: performance levels of speech, 
music, and art; manifestation of creative endeavors, personal and social adjustment, physical well-being. 

Teacher-modified forms (preferably with consultant aid) of the semantic differential scale. 

(II. Indicators of Status or Change in Student Behaviors Other than Those Measured by Tests, Inventories, and Observation 
Scales In Relation to the Task of Evaluating Objectives of School Programs 

Absences: full-day, half -day, and other selective indices pertaining to frequency and duration of lack of atten- 
dance. 

*Appended material to paper entitled "Paradigm Involving Multiple Criterion Measures for the Evaluation of the 
Effectiveness of School Programs" presented at the 1967 Annual Meeting of AERA, February 16, 1967, held in 
New York City. 
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Anecdotal records: critical incidents noted including frequencies of behaviors judged to be highly undesirable or 
highly deserving of commendation. 

Appointments: frequencies with which they are kept or broken. 

Articles and stories: numbers and types published in school newspapers, magazines, journals, or proceedings of 
student organizations. 

Assignments: numbers and types completed with some sort of quality rating or mark attached. 

Attendance: frequency and duration when attendance is required or considered optional (as in club meetings, 
special events, or off-campus activities). 

Autobiographical data: behaviors reported that could be classified and subsequently assigned judgmental values 
concerning their appropriateness relative to specific o.bjectives concerned with human development. 

Awards, citations, honors, and related indicators of distinctive or creative performance: frequency of occurrence 
of judgments of merit in terms of scaled values. 

Books: numbers checked out of library, numbers renewed, numbers reported read when reading is required or 
when voluntary. 

Case histories: critical incidents and other passages reflecting quantifiable categories of behavior. 
Changes in program or in teacher as requested by student: frequency of occurrence. 

Choices expressed or carried out: vocational, avocational, and educational (especially in relation to their judged 
appropriateness to known physical, intellectual, emotional, social, aesthetic, interest, and other factors). 

Citations: commendatory in both formal and informal media of communication such as in the newspaper, televi- 
sion, school assembly, classroom, bulletin board, or elsewhere (see Awards). 

"Contracts": frequency or duration of direct or indirect communications between persons observed and one or 
more significant others with specific reference to increase or decrease in frequency or to duration relative to 
selected time intervals. 

Disciplinary actions taken: frequency and type. 

Dropouts: numbers of students leaving school before completion of program of studies. 

Elected positions: numbers and types held in class, student body, or out-of -school social groups. 

Extracurricular activities: frequency or duration of participation in observable behaviors amenable to classifica- 
tion such as taking part in athletic events, charity drives, cultural activities, and numerous service-related 
avocational endeavors. 

Grade placement: including numbers of recommended units of course work in academic as well as in non-college 
preparatory programs. 

Grouping: frequency and/or duration of moves from one instructional group to another within a given class 
grade. 

Homework assignments: punctuality of completion, quantifiable judgments of quality such as class marks. 
Leisure activities: numbers and types of; times spent in; awards and prizes received in participation. 
Library card: possessed or not possesses; renewed or not renewed. 
Load: numbers of units or courses carried by students. 

Peer group participation: frequency and duration of activity in what are judged to be socially acceptable and 
socially undesirable behaviors. 



Performance: awards, citations received; extra-credit assignments and associated points earned; numbers of books 
or other learning materials taken out of the library; products exhibited at competitive events. 

Performance: awards, citations received; extra-credit assignments and associated points earned; numbers of books 
or other learning materials taken out of the library; products exhibited at competitive events. 

Recommendations: numbers of and judged levels of favorableness. 

Recidivism by students: incidents (presence or absence or frequency of occurrence) of a given student's returning 
to a probationary status, to a detention facility, or to observable behavior patterns judged to be socially undesir- 
able (intoxicated state, dope addiction, hostile acts including arrests, sexual deviation). 

Referrals: by teacher to counselor, psychologist, or administrator for disciplinary action, for special aid in over- 
coming learning difficulties, for behavior disorders, for health defects or for part-time employment activities. 

Referrals: by student himself (presence, absence, or frequency). 

Service points: numbers earned. 

Skills: demonstration of new or increased competencies such as those found in physical education, crafts, home- 
making, and the arts that are not measured in a highly valid fashion by available tests and scales. 

Social mobility: numbers of times student has moved from one neighborhood to another and/or frequency which 
parents have changed jobs. 

Tape recordings: aitical incidents contained and other analyzable events amenable to classification and 
enumeration. 

Tardiness: frequency of. 

Transiency: Incidents of. 

Transfers: numbers of students entering school from another school (horizontal move).. 

Withdrawal: numbers of students withdrawing from school or from a special program (see Dropouts). 

Indicators of Status or Change in Cognitive and Affective Behaviors of Teachers and other School Personnel in Relation 
to the Evaluation of School Programs. 

Articles: frequency and types of articles and written documents prepared by teachers for publication or 
distribution. 

Attendance: frequency of, at professional meetings or at inservice training programs, Institutes, summer schools, 
colleges and universities (for advanced training) from which inferences can be drawn regarding the professional 
person's desire to improve his competence. 

Elective offices: numbers and types of appointments held In professional and social organizations. 

Grade point average: earned in postgraduate courses. 

Load carried by teacher: teacher-pupil or counselor-pupil ratio. 

Mail: frequency of positive and negative statements in written correspondence about teachers, counselors, 
administrators, and other personnel. 

Memberships including elective positions held in professional and community organizations: frequency and 
duration of association. 

Model congruence index: determination of how well the actions of professional personnel in a program approxi- 
mate certain operationally-stated judgmental criteria concerning the qualities of a meritorious program. 

Moonlighting: frequency of outside jobs and time spent in these activities by teachers or other school personnel. 

Nominations by peers, students, administrators, or parents for outstanding service and/or professional 
competencies: frequency of. 
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Rating scales and check lists (e.g., graphic rating scales or the semantic differential) of operationally-stated dimen- 
sions of teachers' behaviors in the classroom or of administrators' behaviors in the school setting from which 
observers may formulate inferences regarding changes of behavior that reflect what are judged to be desirable 
gains in professional competence, skills, attitudes, adjustment, interests, and work efficiency; the perceptions of 
various members of the total school community (parents, teachers, administrators, counselors, students, and class- 
ified employees) of the behaviors of other members may also be obtained and compared. 

Records and reporting procedures practiced by administrators, counselors and teachers: judgments of adequacy 
by outside consultants. 

Termination: frequency of voluntary or involuntary resignation or dismissals of school personnel. 

Transfers: frequency of requests of teachers to move from one school to another. 

V. Indicators of Community Behaviors in Relation to the Evaluation of School Programs 

Alumni participation: numbers of visitations, extent of Involvement in PTA activities, amount of support of a 
tangible (financial) or a service nature to a continuing school program or activity. 

Attendance at special school events, at meetings of the board of education, or at other group activities by 
parents: frequency of. 

Conferences of parent-teacher, parent-counselor, parent-administrator sought by parents: frequency of request. 

Conferences of the same type sought and initiated by school personnel: frequency of requests and record of 
appointments kept by parents. 

Interview responses amenable to classification and quantification. 

Letters (mail): frequency of requests for information, materials, and servicing. 

Letters: frequency of praiseworthy or critical comments about school programs and services and about the 
personnel participating in them. 

Participant analysis of alumni: determination of locale of graduates, occupation, affiliation with particular 
institutions, or outside agencies. 

Parental response to letters and report cards upon written or oral request of school personnel: frequency of 
compliance by parents. 

Telephone calls from parents, alumni, and from personnel in communications media (e.g., newspaper reporters): 
frequency, duration, and quantifiable judgments about statements monitored from telephone conversations. 

Transportation requests: frequency of. 
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Introductory Note-Cluster 4: "Attitudes, Judgments, and Opinions" 



As contemporary events clearly document, evaluation is, above all, a political endeavor. Political action is the basis for 
inr\plementing reform movements in a complex, democratic society, and the call for increased accountability cannot be iso- 
lated from a widely-held desire that education should be changed. 

The first of two items in this concluding section deals with controversies generated by performance contracting. It is fol- 
lowed by materials related to analysis of the attitudes we hold about evaluation, for those who assess and ultimately judge are 
not without their own biases. We cannot escape them; but vye can discover more information about our beliefs as well as 
those of others. 
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GAIN SCORE ERRORS IN PERFORMANCE CONTRACTING 



Robert E. Stake and James L. Wardrop 
University of Illinois at Urbana-Champaign 



(Editor 's Note; The major concerns which underlie the comments appearing in this item focus on the ur^e of individual gain 
scores as a basis for payment to a performance contractor. The authors' principal reservation is with the erroneous belief 
(accepted by many schoolboards) that short-term achievement of individual students can be measured reliably by standardized 
achievement tests. According to their paper, one should expect that 25 percent of the students will show a year's gain in 
achievement entirely due to the error in such tests. 

According to Stake and Wardrop's figures, if students were tested and retested on a parallel form of the test the next day, 
one should expect to have one child in four grow "miraculously" a year or more in achievement! Thus, the performance con- 
tractor whose basic fee can be covered by payment for one out of four students having "grown a year or more," as measured 
by a standardized test, may be in a no-risk business, due to the schoolmen's lack of knowledge of standardized test 
reliability.) 

Recent efforts to evaluate the effectiveness of instruction-particularly in performance contractlng-ref lect confidence that 
short-term achievement can be measured reliably by standardized achievement tests. Contracts such as the Dorsett-Texarkana 
contract (Andrew and Roberts, 1 970) pay off on an individual-student basis. The contractor typically is to be reimbursed for 
each student who gains more than a specified amount. The student is to be tested, trained, and retested with a carefully 
normed, commercially published achievement test. For such tests, and for these only, scores can be reported in terms of the 
grade equivalent, a publicly interpretable indicator of student academic standing. The contract, for example, niight call for 
termination of remedial training when the student shows a one-year gain in his grade-equivalent score. 

Lord (1956) and Webster and Bereiter (1963) have demonstrated that such gain scores are unreliable. At present, this 
unreliability of gain scores (on two parallel forms of the same test) is such that in reading, for example, we should exoect 25% 
of the students to show a year's gain in achievement merely as an artifact of the error in testing. (Of course, 25% would also 
show a year's loss.) Nevertheless, with encouragement from the U. S. Office of Education, school districts are contracting 
with commercial firms for instruction with reimbursement based on individual gain scores. It appears to us that this criterion 
should be challenged by specialists in educational measurement. 

Consider the error in these grade-equivalent scores. For a typical standardized test, the Technical Manual provides such 
information as the following: 

Reliability of each of two parallel forms equals .84 
Intercorrelation between scores on the two forms equals .81 
Standard deviation for either form (grade equivalents) equals 2.7 years 

Using these data, we may apply the conventional formula (Thorndike and Hagen, 1961, p. 192) and find the reliabihiy of 
the difference between each student's scores on the two forms: 

Reliability of difference scores equals .16 

We find the standard deviation of those differences in the usual way {e.g., Glass and Stanley, 1970, p. 128): 

Standard deviation of difference scores equals 1 .66 years 

Knowing the standard deviation (SD) and the reliability (R) of these difference scores, we can f ind the standard error (SE) 
of those differences as SE equals SD 1-R : 

Standard error of the difference equals 1.52 years 

The probable error (PE) is about 1.0 years (PE equals .6475SE). On the average, errors for 50% of the students in a group 
would exceed the probable error. That is, approximately one student in four v^ould show a "gain" (and one in four would 
show a "loss) of at least one year when tested with one form of the test and then retested with the other form simply as a 
result of the errors of measurement of the test. Here are two more ways to express this result in the typical performance- 
contracting situation (we are ignoring the gain that might result simply from exposure to the pretest): 

Suppose that three students were to be tested with a parallel form immediately after the pretest. The chances are better 
than 50-50 that at least one of the three would have gained a year or more and appear ready to graduate from the pro- 
gram. 

Suppose that 100 students were admitted to contract instruction and pretested. After a period of time involving no 
training, they were tested again and the students "gaining" a year were graduated. After another period of time without 
training, another test and another graduation occur. After the fourth such "terminal" testing-even though no instruc- 
tion had occurred"the chances are better than 50-50 that two-thirds of the students would have graduated. 

! 
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Obviously the unreliability of gain scores in such circumstances will assure the appearance of learning even when there is 
no learning at ail. 

To reduce the magnitude of these errors would require increasing the reliability or decreasing the standard deviation of the 
test forms. (Verify for yourself that the intercorrelation between test forms will have no effect on the probable error of the 
difference scores.) Increasing test reliability offers little hope, for in our example increasing the reliability to .96 would leave 
us with a probable error of more than one-half year. Nor is it reasonable to reduce the standard deviation by seeking a more 
homogeneous reference group, for the increased homogeneity would atso lower reliability. The problem is not one of refer- 
ence group; it is a problem of validity. The conventional achievement test does not have the necessary content validity for 
Individual student assessment. For years test authors and test publishers have cautioned users against using these tests as diag- 
nostic instruments. Performance-contract criterion tests should, in effect, be diagnostic tests. 

Measurement consultants and the school district's evaluator should insist on a criterion procedure that is valid. (Conven- 
tional reliability is not essential; small measurement error is.} Criterion testing might involve the use of specially developed 
criterion-referenced items, performance simulations, and clinical observations and professional judgments. None of these is 
currently transformable into grade equivalents. Grade equivalents come from standardized tests. If the advantages of stan- 
dardized tests (grade equivalents, content selection by experts, technical editing, objective scoring, ready availability, etc.) are 
desired, the contract should be based on group performance rather than individual-student performance scores. 

There are other hazards in measuring and interpreting such scores. Regression effects, inappropriate control groups, unwar- 
rented similarities and dissimilarities in teaching and testing materials, misrepresentation of objectives, and unwarranted extra- 
polation are some that Stake (1971) and Wardrop (1971) have described. 

This brief look at the unreliability of gain scores does not indicate whether performance contracting Is an appropriate 
remedy for a district's instructional weaknesses. Standardized tests continue to be valid for discriminating among students 
and among districts tor various educational purposes. This look at the unreliability of gain scoi es does indicate that indi- 
vidual-student gain on a currently available standardized test should not be used as a criterion of successful instruction. 
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EVALUATION EXPECTATIONS 



The Director of Research is talking with the Associate Superintendent for Instruction. They are talking about their new pro- 
grams in reading. 

Director of Research: "I think if we switch our end-of-the-year test to an inventory with better content validity, we will find 
out whether or not we should continue these programs. That change and the addition of two more control groups getting the 
traditional materials will put us in a much more defensible position insofar as our evaluation plan is concerned." 

Associate Superintendent: 'That's a good idea, but I know our teachers need help in using the new materials and the parents 
of our children really don't know what the program is all about. When the parents and teachers understand and support a 
program, its effectiveness increases considerably." 

How is it possible for these two educators to realize that they have different viewpoints about evaluating curricula? What does 
each of them expect that an evaluation study will do for the reading programs? 

One way of examining their different viewpoints is with the help of the CIRCE Attitude Scale 1.3. Through the use of this 
scale, it is possible to create a profile for both the Associate Superintendent and the Research Director which will enable 
them to compare themselves and their responses to individual items. 

After taking and self-scoring the 48-item inventory, the Assistant Superintendent and the Research Director could each 
sketch his profile by connecting his responses with a penciled line. You can create their profiles by doing just that. An illus- 
tration follows on the next page. ^ 
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(in evaluation) 



The CIRCE Attitude Scale 1 .3 is not yet fully developed but early tryouts indicate that it has some validity and that it helps 
people talk about their viewpoints toward remedying this failure to communicate. 



CIRCE Attitude Scale IMo. 1,4 Name 

Attitudes toward Educational Evaluation. Below are a number of statements about the evaluation of educational programs. A 
program can be a lesson, a course, a whole curriculum, or any training activity. Consider each statement as a statement of 
opinion. If you anree at least a little bit with the statement, Circle the letter A. If you disagree even a little bit with the state- 
ment, circle the letter D. If you both agree and disagree, or if you have no opinion, leave the letters uncircled, 

A = AGREE D = DISAGREE Blank = Neither 



1. 


A 


D 


The major purpose of an educational evaluation study should be to gather information that will be helpfuf 
to the educators. 


9 


A 
rA 


n 
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programs. 


4, 


A 

A 


n 


The evaluator should accept the responsibility of finding the strongest, mcst defensible, and publicly attrac- 
tive points of the program. 


5, 


A 


D 


In evaluating a program, it is at least as important to study and report on the types of teaching as it is to 
study and report on the amount of learning. 


6. 


A 


D 


The evaluator should draw a conclusion as to whether or not the goals of the program are worthwhile. 


7 


A 

M 


n 


it is more important to evaluate a program in comparison to what other programs do than to evaluate it 
with reference to what its objectives say it should do. 


Q 
O. 


A 
A 


u 


Principals and superintendents should not gather data about the quality of instruction in the classroom. 


Q 

y. 


A 

A 


n 
u 


The task of putting educational objectives into writing is more the responsibility of the evaluator than that 
of the educator. 


10. 


A 

A 


U 


It is essential that the full array of educational objectives be stated before the program begins. 


11. 


A 


D 


Evaluation studies would improve if they gathered more kinds of information, even if at the expense of 
gathering less reliable information. 


1 o 
1 z. 


A 

A 


rv 
U 


Evafuators should ignore data that cannot be objectively verified. 




A 

A 


pi 

u 


Education should have more of an engineering orientation than it now has. 




A 
A 


n 
u 


The job of an evaluator is mostly one of finding out how well students learn what they are supposed to 
learn. 


15. 


A 


D 


Evaluation should aid an educator in revising his goals even while the program is in progress. 


16. 


A 


D 


The process of decision-making about the curriculum is one of the weakest links in the present operation of 
the schools. 


17. 


A 


D 


Educators have some important aims that cannot be stated adequately by anyone in terms of student 
behaviors. 


18. 


A 


D 


Information from an evaluation study is not worth the trouble it makes. 


19. 


A 


D 


The first job in instruction is the formulation of a statement of objectives. 


20. 


A 


D 


A teacher should tell his students any and all of his teaching objectives. 


21. 


A 


D 


The major purpose of educational evaluation is to find out the worth of what is happening. 


22. 


A 


D 


The evaluator should be a facilitator more than a critic or reformer or scholar. 


23. 


A 


D 


Some school experiences are desirable because they round out a child's fife-whether or not they increase his 
competence or change his attitudes. 



ERIC 



39 



24. 


A 


D 


An evaluator should find out if the teaching Is In fact the kind that the school faculty expects It to be. 


25. 


A 


D 


Whether or not an evaluation report Is any good should be decided pretty much on the same grounds that 
research journal editors use to decide whether or not a manuscript should be published. 


26. 


A 


D 


The main purpose of evaluation Is to gain understanding of the causes of good instruction. 


27. 


A 


D 


Description and value judgment are equally Important components of evaluation. 


28. 


A 


D 


in conducting an evaluation, there Is no justification for the exercise of subjective judgment of any kind by 
the evaluator. 


29. 


A 


D 


Educational evaluation is a necessary step in the everyday operation of the school. 


30. 


A 


D 


The strategy of evaluation should be chosen primarily in terms of the particular needs the sponsors have for 
evaluation data. 


31. 


A 


D 


The educational evaluator should attempt to conceal all of his personal judgment of the worth of the 
program he is evaluating. 


32. 


A 


D 


The sponsor of an evaluation should have the final say-so in choosing or eliminating variables to be studied. 


33. 


A 


D 


The main purpose of educational evaluation is to find out what methods of instruction work for different 
learning situations. 


34. 


A 


D 


Parents' attitudes should be measured as part of the evaluation of school programs. 


35. 


A 


D 


An evaluator finds It almost Impossible to do his job without intruding upon the operation of the program 
at least a little. 


36. 


A 


D 


All important educational aims can be expressed in terms of student^behaviors. 


37. 


A 


D 


Some educational goals are best expressed m terms of teacher behaviors. \ 


38. 


A 


D 


It is essential that evaluation studies be designed so that the findings are generallzable to other curricula. 


39. 


A 


D 


An evaluation study should pay less attention to the statistical significance of a finding than an instruc- 
tional research study would. 


40. 


A 


D 


Evaluation interferes with the running of schools more than (t helps. 


41. 


A 


D 


Little evaluation planning can be done before you get a statement of Instructional objectives. 


42. 


A 


D 


The leader of an evaluation team should be a teacher. 


43. 


A 


D 


The entire school day and the entire school experience should be divided up and assigned to the pursuit of 
stated educational goals. 


44. 


A 


D 


An evaluation of an educational program should include a critical analysis of the value of the goals of the 
program. 


45. 


A 


D 


Every teacher should have formal ways of gathering information about the strengths ar>d shortcomings of 
his instructional program. 


46. 


A 


D 


Money spent on evaluation contributes more to the Improvement of education than any other expenditure. 


47. 


A 


D 


There jUst Is no way that careful and honest evaluation can hurt a school program. 


48. 


A 


D 


If an evaluation study Is well designed, the primary findings are likely to improve decisions made by admin- 
istrators, teachers, and students themselves. 


49. 


A 


D 


When the evaluator has to choose between helping this staff run Its program better and helping educators 
everywhere understand all programs a little better he should choose the latter. 
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Linking Section 5 - An Interpretive Note 



The reluctance of educators to view evaluation as a reliable aid in decision-making is often reinforced by the esoteric ter- 
minology which they see as synonomous with the field. 

The Glossary of Terms in Part VI is not an exhaustive compilation but rather is a selection of pertinent terms taken from a 
much larger source. It should assist readers of the Resource Unit in making the widest possible use of materials found here. 
That span of utility includes attempts to anticipate future problems in education. For example, the current interest in perfor- 
mance or behavioral objectives is likely to result in decreased emphasis on content coverage, an outcome which has received 
little attention to date. 
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PART Vl-A GLOSSARY OF TERMS 



ACCOUNTABILITY 

In evaluation, the process during which one provides relevant audiences with descriptions and/or explanations for which 
one is responsible. 

ACHIEVEMEIMT AGE 

The age for which a given score on an achievement test is the real or estimated age. Also called educational age or subject 
age. 

ACHIEVEMENT TEST 

An ability test designed to assess the amount an individual has learned in a specified subject area as a result of past 
experience or training. 

AFFECTIVE DOMAIN 

One of three major categories for classifying educational goals. According to Krathwohl, Bloom, and Masia the affective 
domain consists of "objectives which emphasize a feeling tone, an emotion, or a degree of acceptance or rejection. Affective 
objecti\ies vary from simple attention to complex but internally consistent qualities of character and conscience." Included in 
the taxonomy of the affective domain are the following five major categories: (1) Receiving (Attending), (2) Responding, (3) 
Valuing, ,(4) Organization, and (5) Characterization by a Value or Value Complex. Sub-categories are also available. 

AGE NORMS 

Values representing the chronological age at which a given level of behavioral development is normally, or on the average, 
attained and to which test scores are sometimes converted. Essentially a norming system based on age equivalents. 

ANECDOTAL RECORD 

Usually, a written account describing ai, observation of an event or an incident of an individual's behavior. Anecodotes 
may also be transcribed on tape recorders or by other audio-visual means. The anecdote typically should contain a description 
of what the individual did and said as well as a description of the situation in which the behavior occurred. Under usual condi- 
tions one would record incidents which appear relevant for an understanding of the individual, either as being atypical or 
usual. 

APTITUDE 

The capacity or extent to which an individual may be expected to acquire a particular kind of ability. It also may be 
defined as an ability to learn or a readiness for learning. 

APTITUDE TEST 

An ability test designed to assess or, more precisely, to permit predictions of what the individual, under standard condi- 
tions, can learn to do. Aptitude tests may be placed in one of two categories: measures of general scholastic aptitude (the 
general intelligence test) and measures of specif ic aptitudes (music, art, foreign language, etc.) 

ASSESSMENT 

Any of Q number of procedures for making relevant evaluations or differentiations among individuals or groups in respect 
to any characteristic, attribute, or product. 

BEHAVIORAL OBJECTIVES 

Educational goals which are stated in observable terms and reflect what the student will be like or will be able to do after 
instruction. 



^Adapted from Referenced Glossary of Terms in Evaluation and Measurement, Alan Ross Coller, ERIC Clearinghouse on 
Early Childhood Education, University of Illinois, Champaign-Urbana, Illinois 61801. 
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CHECKLISTS 



Instruments specifically designed to collect and record relatively unrefined judgments systematically. An observer indi- 
cates, usually by use of checkmarks on a printed form, whether or not the person, place, thing, or event-real or abstract-has 
the characteristic as specified by the checklist item. Data from checklists are recorded as if from a yes or no form and should 
be contrasted against data from thfj more refined rating scales which are in the form of scores representing points along a 
continuum. 

CHRONOLOGICAL AGE (CA) 

The real-life age of an organism as expressed in some unit of time; the time elapsed since birth. In measurement, chrono- 
logical age (CA) is most conveniently expressed in units of months which often calls for the individual's real age to be 
modified to fit the requirements of the particular test. For example, a child who is 5 years, two months, and 17 days may be 
assigned the CA of 62 months in one test and 63 months in another test. 

COGNITIVE DOMAIN 

One of the three major categories for classifying educational goals. According to Bloom and his colleagues "the cognitive 
domain. . .includes those objectives which deal with the recall or recognition of knowledge and the development of intellec- 
tual abilities and skills." Included in the taxonomy of the cognitive domain are the following six categories: (1) Knowledge, 
(2) Comprehension, (3) Application, (4) Analysis, (5) Synthesis, and (6) Evaluation. 

CRITERION BASED STANDARDS 

An absolute type of standard derived from the mastery performance desired on any given task. To be distinguished from 
empirical norms and estimated norms. 

CRITERION-REFERENCED TESTS 

Any test the interpretation of which is concerned with estimating the degree to which an individual's achievements have 
progressed toward some criterion or standard. Such tests, also called mastery tests (because they are us^d to evaluate mastery 
learning) are not intended to provide scores that will rank students in terms of their achievements (norm or individually refer- 
enced tests do this); rather they provide "absolute" scores employed to classify students as belonging to either one of at least 
two groups-those whose achievements resemble criterion achievement, and those whose achievements do not. These tests are 
frequently used in conjunction with programmed instruction since these tests make specific learning difficulties relatively 
easy to diagnose. 

CRITERION-RELATED VALIDITY 

A general term used to rt'^er to those measures of validity which are employed to determine the degree of association 
between performances on these tests designed either to forecast an individual's f utU'^'^^ tanding or current standing on a 
variable different from the tests and between the variable itself. If the test is to u^c M to forecast the individual's future 
standings on some dimension we are concerned with predictive validity. If the test h io be used to determine the individual's 
current standing in respect to some dimension, we are concerned vyith concurrent validity. Criterlon-reSated validity is also 
called empirical validity. 

DESCRIPTIVE STATISTICS 

Those methods employed to describe the characteristics of all those members of a group for which data is available. 
Descriptive statistics may be employed to describe populations (e.g. the U.S. Census employs such statistics to count and 
categorize the population) and samples drawn from populations. However, in the latter instance, the sample is regarded as the 
group not the representative population from which the sample was drawn. Inferential statistics are employed to make 
inferences about some larger group on the basis of samples. 

DIAGNOSTIC EVALUATION 

A type of evaluation designed either for purposes of student placement or for determining the underlying causes for learn- 
ing deficiencies in the student. When diagnostic evaluation is performed prior to instruction it is usually intended for student 
placement; when performed during instruction its main function is the discovery of the causes for learning difficulties. 

DIAGNOSTIC TEST 

A type of test designed to identify problem areas. Diagnostic achievement tests are used to pinpoint specific areas of 
subject-matter strength and weakness, and to determine the exact nature of the strengths and weaknesses. When strengths and 
weaknesses have been diagnosed, remedial action may be instituted. Diagnostic tests may be contrasted with survey tests 
which are designed to provide only a general overview of subject-matter competence. 
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EDUCATIONAL MEASUREMENT 



Refers to the employment of data-gathering techniques in order to collect precise and more or less objective descriptive 
information about the educational process. The more parochial view regards measurement as only a part, although an impor- 
tant part, of the total process of evaluation. This latter vlev^ treats the results of measurement as numbers which describe or 
express something about the characteristics observed and evaluation as the process by which judgments about these descrip- 
tive numbers are made. 

'\ 

EDUCATIONAL OBJECTIVES 

A statement of instructional purpose which describes the ways in which students are to be changed by their interaction 
with the educational situation. Educational objectives would communicate the educator's intent and.be expressed in terms of 
observable student behavior and content areas. It also has been suggested that educational objectives should as well contain a 
description of how the student is to demonstrate that he has achieved the objective and the important conditions under 
which the behavior is expected to occur and the accuracy of the anticipated performance. 



EMPIRICAL 

Based upon a factual investigation or experience rather than upon reason or theory. An empirical evaluation, for example, 
is one in which one might try to find out how much students actually learned from various textbooks. A non-empirical evalu- 
ation might be composed of a group of teachers analyzing the contents of several texts to determine from which of them the 
students would best learn. 

EMPIRICAL NORMS 

In measurement, standards based upon the responses of a representative sample of examinees of the type for whom the 
test was constructed. Such standards, also called norms, are likely to occur with published tests. To be distinguished from 
estimated norms and criteria-based performance. 

FACE VALIDITY 

A judgment of whether a test or technique measures what it looks like it should measure. The cooperativeness of the 
examinee may be affected by the degree to which test items appear to be related to the stated aims of the test. If the test 
items seem unreasonable the examinee may not be adequately motivated to do his best. Unlika other estimates of validity, 
face validity does not ordinarily result in a numerical index but rather a qualitative one. 

FACTOR ANALYSIS 

Any of a set of procedures employed to analyze the interrelationships among a set of variables. By a process which assesses 
the proportion of variance of each variable that is associated with a limited set of factors; an assumption of commonality that 
different variables have many aspects in common; that they may be measuring the same things, etc. 

FORMATIVE EVALUATION 

Systematic evaluation that occurs before the terminating point of a segment of instruction, that is, during the learning 
process when the student is undergoing change. Such evaluations are useful for curriculum construction, teaching, and 
learning. For example, formative evaluation procedures may be employed to indicate areas in which remediation is needed so 
that instruction can be so modified. Formative evaluation is distinguished from summative evaluation in that it occurs during 
the operational sequence of a program or project. "Feedback" is part of formative evaluation. 

GENERAL INTELLIGENCE TEST 

A general aptitude type of ability test, sometimes called a general ability test employed in the prediction of scholastic 
ability. Existing tests are thought to be composed of both crystallized and fluid general ability items. 

INFERENTIAL STATISTICS 

Those methods employed to make estimates or inferences concerning the characteristics of a larger group on the basis of 
only a sample from that group. Inferential statistics is to be distinguished from descriptive statistics which Is primarily con- 
cerned with describing the characteristics of a group. 

NORM-REFERENCED TESTS 

Any test, the interpretation of which is primarily concerned with estimating the level of an individual's achievements in 
respect to other individuals-the norming population. The individual's "relative" standing, as contrasted to the "absofute" 
standard employed in criterion-referenced tests, is the essential purpose of the measurement. Such tests, also called indi- 
vidually-referenced tests, are usually comprised of items adjusted to a 50-60% itenj difficulty level. In contrast, criterion- 
referenced tests usually have item difficulty levels of about 85%. 
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NORMS 



Typically, represent the average performance level on tests or in activities with which individuals or groups may be 
compared. In the school situation, norms are usually employed to indicate average performances vis-a-vis different age and 
grade levels. In evaluation, the comparison of local observations with norms forms the basis of the judgmental process. 

PERFORMANCE CHECKLIST 

A type of checklist designed especially to collect and record relatively unrefined judgments concerned with the behavior or 
performance of individuals or groups. Such checklists, also called behavior checklists, require the observer to simply indicate, 
usually by checkmarks, whether the behavior of the individual or group concerned has the characteristics as specified by the 
checklist item. 

BEHAVIOR RATING SCALE 

A type of rating scale designed especially to collect and record systematic and refined judgments concerned with the 
behavior (or performance, or characteristics, or traits) of individuals or, in some cases, groups. While the basic form of behav- 
ior rating scales may vary to a considerable extent, all such instruments employ underlying behavioral continuums and rely 
upon a rater to make observations of the individual being rated. The behavior in question may be cognitive, affective, or 
psychomotor. • 

SITUATION TEST 

A procedure whereby the behavior of all individuals (or small group of persons) is observed while performing in a more or 
less structured situation. Some situations call for role playing as in the psychodrama technique or the more modern encounter 
groups. Other situations may call for discussions, problem solving, or cooperative ventures. 

SUMMATIVE EVALUATION 

Systematic evaluation that takes place after the termination of an instructional segment; i.e., at the end of a unit chapter, 
course, or semester. Summative evaluation generally requires judgments; its primary goals being grading, selecting, certifying, 
providing feedback, judging teacher effectiveness, and" comparing wricula. An intermediate-type of summative evaluation is 
to be distinguished from a longer-term evaluation. Of the two, th'^ Jortner Is more concerned with outcomes that are more 
direct, less generalizable and less transferable. Summative evaluatf^n is also to be distinguished from formative evaluation 
which takes place before the termination of an instructional segr\'.*5int. 

TASK ANALYSIS 

The process by which a desired outcome is described and then analyzed into behavioral components which must be 
sequentially developed in order to arrive at the terminal behavior-the desired outcome. A prime purpose of task analysis is to 
apply findings derived from learning theory to the instructional sequence-the set of hierarchical steps. 
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